5 research outputs found

    Improving HIV Patients’ Appointment Adherence Using a Predictive Model

    Get PDF
    Although much has been done to reduce the transmission of HIV, there are still 36.7 million individuals living with the virus worldwide, with 1.1 million of those living in the US1. At the University of Alabama at Birmingham (UAB) there is a specialty HIV clinic, the 1917 clinic, which has helped over 12,000 patients in their HIV healthcare journey since 1988. One way to reduce the prevalence of HIV is by engaging patients in their care. Patients that arrive at their clinic visits regularly have been shown to have improved medication adherence, reduced mortality, and improved health outcomes2. Our clinic experienced a 19% visit no show rate in fiscal year 2018. Identifying patients at risk for no show visits is important for clinic scheduling, staffing and resource allocation. Identifying those at risk is the first step to the implementation of strategies to decrease no shows. The data used for this project consisted of appointment data for 4,950 HIV positive patients between October 2015 and July 2019 at UAB’s HIV clinic. The data source utilized was UAB’s scheduling software, IDX®. To ensure data accuracy, automate data pre-processing, and provide a list of patients to be contacted, a relational database was developed in MS SQL Server using IDX® data and an automated SSIS package was utilized for the extract, transfer, and load (ETL) processes. A predictive logistic regression model (predicting no show Yes vs. No) was developed using the data from the database. The data was split into training (80% or 64,354 records) and testing (20% or 21,450 records) sets prior to deploying the model. A probability score for each patient was generated based on the trained logistic regression model with the independent variables of: previous no show/total appointments ratio, age, race, lag days (the days between scheduled and appointment dates) and gender. A threshold value, above which the patient was more likely to no show, was selected on the basis of analysis of the receiver operating curve (ROC) which showed the distribution of probability scores. The model was then tested for precision, specificity, and the weighted average (F1). Among the 21,450 testing records, the distribution of no-show probability scores ranged from 0.09 to 0.83. A threshold of 0.3 was chosen based on its comparatively lower false negative (FN) error of 1,549. The true positive (TP), or those patients predicted to no show and actually did no show, was 6,435. The model works with 44% precision, or specificity, 80% sensitivity, with an F1 of 57%. A cutoff point was chosen that emphasized sensitivity in our model as we preferred to bring the potential of an intervention to more patients than what was needed. By using predictive algorithms, such as ours, healthcare organizations will be better positioned to identify and have the opportunity to offer interventions to engage patients in longitudinal clinic care and the ensuing improved morbidity and mortality

    A Practical and Empirical Comparison of Three Topic Modeling Methods Using a COVID-19 Corpus: LSA, LDA, and Top2Vec

    Get PDF
    This study was prepared as a practical guide for researchers interested in using topic modeling methodologies. This study is specially designed for those with difficulty determining which methodology to use. Many topic modeling methods have been developed since the 1980s namely, latent semantic indexing or analysis (LSI/LSA), probabilistic LSI/LSA (pLSI/pLSA), naïve Bayes, the Author-Recipient-Topic (ART), Latent Dirichlet Allocation (LDA), Topic Over Time (TOT), Dynamic Topic Models (DTM), Word2Vec, Top2Vec, and \variation and combination of these techniques. Researchers from disciplines other than computer science may find it challenging to select a topic modeling methodology. We compared a recently developed topic modeling algorithm Top2Vec with two of the most conventional and frequently-used methodologiesLSA and LDA. As a study sample, we used a corpus of 65,292 COVID-19-focused abstracts. Among the 11 topics we identified in each methodology, we found high levels of correlation between LDA and Top2Vec results, followed by LSA and LDA and Top2Vec and LSA. We also provided information on computational resources we used to perform the analyses and provided practical guidelines and recommendations for researchers

    A critical analysis of COVID-19 research literature: Text mining approach

    Get PDF
    Objective: Among the stakeholders of COVID-19 research, clinicians particularly experience difficulty keeping up with the deluge of SARS-CoV-2 literature while performing their much needed clinical duties. By revealing major topics, this study proposes a text-mining approach as an alternative to navigating large volumes of COVID-19 literature. Materials and methods: We obtained 85,268 references from the NIH COVID-19 Portfolio as of November 21. After the exclusion based on inadequate abstracts, 65,262 articles remained in the final corpus. We utilized natural language processing to curate and generate the term list. We applied topic modeling analyses and multiple correspondence analyses to reveal the major topics and the associations among topics, journal countries, and publication sources. Results: In our text mining analyses of NIH’s COVID-19 Portfolio, we discovered two sets of eleven major research topics by analyzing abstracts and titles of the articles separately. The eleven major areas of COVID-19 research based on abstracts included the following topics: 1) Public Health, 2) Patient Care & Outcomes, 3) Epidemiologic Modeling, 4) Diagnosis and Complications, 5) Mechanism of Disease, 6) Health System Response, 7) Pandemic Control, 8) Protection/Prevention, 9) Mental/Behavioral Health, 10) Detection/Testing, 11) Treatment Options. Further analyses revealed that five (2,3,4,5, and 9) of the eleven abstract-based topics showed a significant correlation (ranked from moderate to weak) with title-based topics. Conclusion: By offering up the more dynamic, scalable, and responsive categorization of published literature, our study provides valuable insights to the stakeholders of COVID-19 research, particularly clinicians.3417985

    Design of a Colorectal Cancer Data Warehouse

    No full text
    Colorectal cancer researchers spend a substantial amount of effort performing integration, cleansing, interpretation, and aggregation of raw data from multiple sources, including health records and clinical research data. These efforts are often replicated for each project, with investigators running up against the same challenges and experiencing the same pitfalls discovered by those before them. Researchers spend substantial portion of their time on data preparation. The overall objective of this project is to design and implement a colorectal cancer data warehouse infrastructure to improve acquisition, management, and analysis of relevant health records, clinical research, and tumor registry data from our institution and state. The current data preparation processes, at best, are inefficient, costly, time-consuming, and cumbersome. Moreover, without the needed information technology (IT) infrastructure, the potential of the ever-growing heterogeneous data accumulated in disparate data sets would be still untapped. \ \ Our previous colorectal cancer work included discovery and validation of biomarkers, the roles of tumor location and race/ethnicity, treatment efficacy, and prognostic/predictive models that considered the effect of molecular, demographical, epidemiological, and clinico-pathologic features on outcomes, such as mortality, relapse, and survival. The data sources for these projects included data exports from clinical records and spreadsheet files created for each research project. Data management for each project is usually performed in an ad-hoc manner, involving manual processes of data entry, matching, and merging. This process is error-prone and inefficient for data reuse, and not suitable to incorporate additional data sources. \ \ This work proposes to initially design and implement a colorectal cancer data warehouse infrastructure that incorporates existing molecular-level and patient-level research data with continuous data feed from institutional enterprise data warehouse (EDW) in a multidimensional database format. Then, we plan to expand the scope of the colorectal cancer data warehouse to include social determinants of health (SDH) and geospatial census data. Furthermore, we propose to include state level tumor registry data. Such a data management platform will enable us to efficiently analyze disparities among various populations, create state-wide map projections and dashboards, and analyze certain outcomes (e.g. risk and aggressiveness of the disease) to identify differences between rural and urban populations. Creation of a colorectal cancer data warehouse infrastructure would allow us to store, clean, and manage the existing data sources efficiently, increase the quality and reliability of underlying data for our research, and incorporate new data sources to facilitate future research

    Analyzing Workers’ Compensation Claims and Payments Made Using Data from a Large Insurance Provider

    No full text
    Background: All states in the USA have established Workers’ Compensation (WC) insurance systems/programs. WC systems address key occupational safety and health concerns. This effort uses data from a large insurance provider for the years 2011–2018 to provide estimates for WC payments, stratified by the claim severity, i.e., medical only, and indemnity. Methods: Besides providing descriptive statistics, we used generalized estimating equations to analyze the association between the key injury characteristics (nature, source, and body part injured) and total WC payments made. We also provide the overall cost burden for the former. Results: Out of the total 151,959 closed claims, 83% were medical only. The mean overall WC payment per claim for the claims that resulted in a payment was 1477(SD:1477 (SD: 7221). Adjusted models showed that mean payments vary by claim severity. For example, among medical only claims, the mean payment was the highest for amputations (3849;CI:3849; CI: 1396, 10,608),andamongdisabilityanddeathrelatedclaims,rupturescostthemost(10,608), and among disability and death related claims, ruptures cost the most (14,285; 7772,7772, 26,255). With frequencies taken into account, the overall cost burden was however the highest for strains. Conclusions: Workplace interventions should prioritize both the costs of claims on average and the frequency
    corecore